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ABSTRACT 

Recommendations are made for data processing 
procedures for the National Center for Education Statistics* (NCES) 
Common Core of Data survey program. Data are anticipated from about 
16,000 local education agencies and 8S,000 schools in the 50 states, 
the District of Coliimbia, and 7 territories. These recommendations 
concern editing and verification performed between the receipt of 
data and the start of substantive analyses — inspecting for suspect 
values, as well as detecting and correcting erroneous values. A flow 
chart details the %iditing ;^rocess, which includes online data entry 
edits, preliminary machine edits, and batch production edits. The 
precise nature and relationship between the data entry and 
preliminary edit phase depends on the form in which the data are 
received^ Online data entry edits are strongly recommended, since the 
first check will usually be for a keying error and the source 
document is still immediately available. The preliminary edit 
examines the data statistically, searching for frequent or universal 
format-type problems, while the production edit attempts to 
individually identify every field or record that fails an edit. Other 
described processes include automatic correction, relational or 
longitudinal edits, table-driven edits, switch-driven edits, and 
user-oriented interfaces* Specific suggestions are described for two 
surveys: the Universe of Public Schools Survey and the Local 
Education Agency Non'^Fi seal Survey. Various survey forms and data 
processing instructions are appended. (GDC) 
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PREFACE 



This report has been prepared as part of Task 2D of 
Project SAGE. The edit descriptions in the report will be used 
as draft specifications for the 1978-79 cycle of the Common 
Core of Data (CCD) program. Major data-handling systems are 
constantly undergoing revision, and two factors interacted to 
stimulate enhancement of the existing system at this time. 
First, the survey instruments for Parts VI-A and VI of CCD have 
been frozen until 1981; thus it is possible to concentrate on 
improving the system, rather than racing this year merely to 
adapt the programs to new survey items. Second, resources were 
available, through Project SAGE, to obtain an independent review 
of the existing system from analysts with their own practical 
experience in the survey data-processing field. The resulting 
combination of NCES and SAGE insight will lead to an imnroved 
product. Finally, it is appropriate to acknowledge two 
individuals who were primarily responsible for the edit proce- 
dures employed last year, who participated in the analyses 
v,hich preceded this report, and who provided the foundation 
for the material developed herein: Warren Hughes and Ted Chmura. 
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INTRODUCTION 



Parts VI-A and VI of the Common Core of Data survey program 
are being put through their first full-scale run this year. 
NCES anticipates receipt of data this year on approximately 
16,000 local education agencies (LEAs) and perhaps 85,000 schools 
from 58 states and territories. While these data promise to 
be invaluable in assessing the status of education today and 
in establishing trends for the future, the millions of charac- 
ters of input that are involved will require attentive processing 
from tho very outset. The purpose of this report is to present 
recommendations for thcit processing which is to be performed 
between the receipt of the data and the start of substantive 
analyses; this period of data examination, of inspection for 
suspect values, of detection and correction of erroneous values,, 
is referred to as the "edit and verification" stage of process- 
ing. 

The next section presents some background and general 
recommendations regarding the overall structure of the edit 
system. No attempt is made to review every possible practice 
in the design of edit systems; rather, only those instances 
are discussed where the CCD edit system might deviate from more 
conventional approaches or might benefit from special enhance- 
ment. The two remaining sections of the report deal with 
specific draft edit specifications for Parts VI-A and VI respec- 
tively. 



SYSTEM- LEVEL CONSIDERATIONS AND RECOMMENDATIONS 
Background 

According to the system plan for Parts VI-A and VI prepared 
by SAGE (Figures la-lf ) , edit processing is to include three 
major components: edits during data entry, preliminary edits, 
and batch production edits. The precise nature and relation- 
ship between the data entry and preliminary edit phase depends 
on the form in which the data are received (c.f. Figure Ic) . 
At least three media are expected, and preliminary edits are 
recommended for all. For hard-copy (form or facsimile) and 
shuttle-list data, both manual and machine preliminary edits 
must be prepared. A manual/clerical scan is used to separate 
out forms that are unreadable, are not filled out properly, 
are missing almost all data items, and so on, and to count the 
number of input records (i.e., schools or LEAs) . Gross problems 
or problems that would impact on data entry are discovered 
ana corrected at this stage. Data entry (Figure Ic) follows 
the clerical screening. Inputting the data via CRT terminal, 
using any of a number of commercially available data entry 
software packages or services, a number of edit checks should 
be performed as the data are entered. These include field 
content verification (e.g./ only numbers in numeric fields), 
presence checks (e.g. , non-optional fields, like LEA identifi- 
cation nxambers, are correctly filled out) , range checks for 
numerical items, validity checks for coded items (e.g. / table 
look-up to verify codes are legal) , internal consistency 
checks and others. The advantage of doing such checks at the 
time of data entry is that the source dociiment is immediately 
available; follow-up is thus facilitated, since the first 
check will generally be for a keying error. When errors of 
these types are only discovered later, during a batch edit 
phase, the source dociiment must be located to check keying, no 
small task given the volume of input to be processed by CCD. 
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;Figure la. SYSTEM FLOW CHART - PARTS VI AND VI- A, COMMON CORE OF DATA 
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Figure lb. SYSTEM FLOW CHART - PARTS VI AND VI-A, COMMON CORE OF DATA (cont'd.) 
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Figure Ic. SYSTEM FLOW CHART - PARTS VI AND VI-A, COMMON CORE OF DATA (cont'd.) 
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Figure Id. SYSTEM FLOW CHART - PARTS VI AND VI- A, COMMON CORE OF DATA (cont'd.) 
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Figure le.' SYSTEM PLOW CHART - PARTS VI AND VI-A, COMMON CORE OF DATA (cont'd.) 
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Figure If. SYSTEM FLOW CHART - PARTS VI AND VI-A, COMMON CORE OF DATA (cont'd.) 
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Data entry is followed by the preliminary computer edit designed 
for machine-readable data.* 

A preliminary edit phase is also called for in dealing 
with data submitted in machine-readable form (Figure Ic) . 
Past experience indicates that several states will likely make 
systematic errors in preparing data for submission, and that 
these errors will often cause every (or nearly every) record 
to be flagged by the production edit program, generating 
voltiminous error printouts. The preliminary machine edit is 
designed to discover such systematic errors very quickly. For 
example, in last year's public school universe survey several 
states used codes for the grade span which were not specified 
in the. instructions (e.g. "K " or " K" for kindergarten, in 
place of the prescribed "KG"); this led to flagging and lengthy 
error printouts for hundreds of otherwise correct records. A 
preliminary edit would have prevented this by providing a fre- 
quency count on all entries, before the production edit phase; 
the use of an alternative code would have h^^n apparent, and 
a trans formation/correction program could have quickly been 
prepared. Another example is the state that accidentally 
left blank the field for ntimber of graduates in last year's 
LEA Non-Fiscal Survey. A preliminary presence check on the 
field (i.e.,. a count of the number of LEAs for which it was 
missing in a given state) would have revealed the problem very 
quickly. 

While the preliminary edit system may check for many of 
the same problems as are examined in the production edit. 



It would be theoretically possible to.* do all edits proposed 
in this report at data entry time. This has not been recom- 
mended in this report because NCES hopes that most states 
will ultimately submit their data in machine-readable form. 
Nevertheless, the possibilities inherent in on-line data 
entry given current technology are extensive, and should be 
kept in mind, even as solutions to short-term problems. 
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there are critical differences: The preliminary edit examines 
a state's data statistically ^ searching for frequent or univer- 
sal, format-type problems; the production edit attempts to 
individually identify every field and record that fail an edit, 
and to correct many kinds of edit failures, automatically where 
possible, and manually when required. Thus, instead of pro- 
ducing a single statistical report for a given state, the 
production edit program produces an edit report for each record 
with an error. In some cases this report will indicate an 
automatic correction has been made (thus permitting a manual 
override of the correction when necessary) ? otherwise, sufficient 
information will be provided to permit a human analyst to 
provide a correction (via follow-up to the original source of 
the data at the state level, etc.). 

Because of the detect/correct cycle inherent in the pro- 
duction edit phase (Figure Id) , a special file system is also 
often required. Data files are loaded into the system when 
first edited, and a temporary expanded version of the file is 
retained. This intermediate file usually has extra data items 
called flag s to indicate for each data field whether it has 
passed edit, has failed an edit and awaits correction, has 
failed an edit and has been automatically corrected, has 
failed an edit and is to be left unchanged (and ignored by 
future edit runs on the same file), and so on. This file 
•system is often constructed using a special file access method 
(e.g., an indexed sequential access method, direct access 
method, etc.), to facilitate updates to individual records 
(manually-supplied corrections).. Once updatv^d, the inter- 
mediate, file may be run through the edit process once again 
(to check updated values, and to check fields vrhich were not 
editable because other, prerequisite, fields fa.iled the first 
edit). Each time through the production edit program, the 
flags are used to determine which fields to edit, which to 
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ignore, and so on. A final edit run is used to process the file 
back from its intCi-mediate form (with edit flags) to the final 
file format.* 

Finally, in addition to the special file system, editing 
requires some method of inserting corrected or updated values. 
It is often the case that some existing utility or package 
program (e.g., SAS) is sufficient; occasionally a special 
purpose program is required for updates in the edit system. 

General Recommendations 

As indicated in the preface, it was not a major flaw in 
the existing edit system that led to this report, but rather 
the availability of resources (time and SAGE) to permit improve- 
ment of the system. In this vein, the discussion which follows 
deals with a number of major and minor edit system components 
which should be considered in the overall CCD edit process. 
The remainder of this section d^als with proposed improvements 
in the system, beginning with two major components, data entry 
and preliminary editing, followed by recommended enhancements 
to the batch production system. 

On-line data entry and edit . This topic has been dealt 
with above, but a few points are worth pursuing. Many desirable 
edits can be performed automatically, using commercially 
available data entry software; the remainder can be programmed 
and added to most such software packages. The opportunity 
to conduct edit checks at ^,ata entry time is tremendously 
valuable, since many errors will be found to be due to incorrect 
keying. If one waits uncil the batch production edit for 
correction, the least expensive opportunity for correction has 
been bypassed — a suspicion about keying error is most easily 

* 

Some edit systems retain a flag string at the end of the 
record, even in the final format. Thus the record length may 
be increased, but the original data field locations are 
usually retained. 
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answered at keying time. Further, if one waits until the pro- 
duction edit,, and then wishes to check against source data 
forms, one must find the form. NCES will be receiving data 
for 16,000 LEAs and 90,000 schools, making later source docu- 
ment recovery a time-consuming endeavor. Further, if a very 
difficult edit failure arises at data-entry time, the operator 
may be instructed to reject the record, and lay it aside for 
analyst intervention, before the data have even been entered 
into the system. 

The potential of on-line data entry is impressive. There 
are even analysts who propose to totally eliminate batch edits, 
and use only on-line techniques at data-entry time.* While 
this extreme position is debatable on cost-effectiveness 
grounds, it is clear that at least partial implementation of 
edits at data-entry time will be valuable for CCD. 

Preliminary machine edits . The topic of preliminary 
editing has been described extensively above, and only a few 
words need to be said here. The emphasis at this stage is on 
statistical reporting, on a state-by-state basis, of the 
quality of the data base; that is, on what general problems 
can quickly be detected for a given state's data. A reasonable 
goal would be to produce a preliminary edit report within 
seven working dc.ys after leceipt of data on magnetic tape, and 
within 20 working days after receipt of data in hard-copy 
form. The specific edit checks which can be performed profitably 
at this stage depend on the details of the survey instrument. 

Statistical reporting . Statistical reporting is not only 
an important component of the preliminary edit phase, but of 
the production edit phase as well. Once editing of the data 
is completed, reports should be assembled from both edit stages 
to provide a final picture of the quality of the data obtained 



Gilb, T. E.^ and Weinberg, G. M. Hximanized Input . Cambridge, 
MA: Winthrop, 1977. " 
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from each state. Analysis of these reports should dictate the 
kind of general follcw-up and technical assistance which may 
be required for long-term improvement of data quality (includ- 
ing forms and instructions revision, better training or communi- 
cation, computer - systems assistance, etc.). 

Automatic correction . in many cases, a data item which 
fails an edit should not require manual correction. Depending 
on the circumstances, one may choose to permit the edit program 
to correct data elements so as to pass the edit ("correct and 
log") or permit it to correct and warn the user ("correct and 
warn"), or permit it to suggest a correction but require the 
user to agree ("suggest and hold"), or permit it only to uncover 
the error ("notify and hold"). The particular option selected 
depends on what data element is involved, what kind of error 
has been committed, and how gross the error is. A situation 
in the LEA Non-Fiscc^l Survey where automatic correction may 
apply is in item 1, where, in each row of this item, column 3 
is expected to be equal to the sum of columns 1 and 2. Assuming 
there are data in all three columns, what should be done if 1 
and 2 don't sum to 3? Suppose they contained 196, 173, and 367 
respectively; in this case it might be reasonable to assume 
that the clerk filling out the form originally made an addi- 
tion error, obtaining a sum of 367 instead of 36 9. If this 
were the case, the c?ppropriate correction would be to replace 
3 67 with 369.* Note, however, that other explanations (and 
thus other corrections) are tenable. For instance* our straw- 
person clerk might have transcribed the value to col^omn 1 
incorrectly, recording 196 instead of 194. Thus, automatic 
correction in such a case might only trade one error for 
others. However, it was found last year when manual inter- 
vention was required prior to correction that in most cases 



* 

Most systems would also enter a code in the appropriate flag 
field to indicate the field had been corrected, and would 
issue a notification to the user. 
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of addition checks, the only pragmatic correction was to adjust 
the sum to agree with the addends. Thus, despite the possi- 
bility that the correction might be itself in error, requiring 
approval of a human analyst gains almost nothing. Faced with 
a choice of permitting the known discrepancy to remain in the 
data (one kind of noise), or introducing a correction which 
might be in error (another kind of noisa) , the human analyst 
chooses the latter. By permitting the computer to make such 
replacements automatically, we save our analyst for more 
difficult tasks. What about the situations where such a 
correction is "unreasonable?" If "unreasonable" can be defined, 
then it can be programmed. For instance, suppose the dis- 
crepancy is quite large; in this case we might wish to correct 
it and draw the analyst's attention to the problem. Thus, we 
use the following automatic correction logic: 

If the reported sum is less than 5% deviant from the 
computed sum, correct and list the correction on 
the log; 

If the reported sum is more than 5% deviant from the 
computed sum, correct and list the correction on 
the "warning" report. 

An even more sophisticated algorithm would be to compare the 
difference of the computed sum and the recorded sum to "5% or 
1.0, whichever is larger" to handle the case of small absolute 
numbers (e.g. 7+2-9). 

Of course, some errors will not easily be amenable to 
automatic correction (e.g., missing data) . Such instances 
must not, however, force the data to remain in the edit system 
forever; there must be provision for releasing data even when 
they are knovm to be bad, and there must be provision for 
recording, in the data record itself, that the data are bad, 
and should not be used, or used only with caution (c.f. 
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missing value codes in standard statistical computing systems) • 
Such record-keeping permits the later use of imputation pro- 
cedures, i.e., statistical procedures for replacing bad data 
values during a later, data analysis phase. 

Relational/longitudinal edits . Relational editing involve 
the comparison of a new value for a data element to an old 
value (s), and examination of the amount of change the element 
has undergone. An example is comparing an LEA's reporting 
membership in the present school year to last year's response 
or to responses from several preceding years. The edit is 
performed by determining whether the extent of change from 
preceding values to the currently reported value is within 
acceptable boundaries (e.g.. "+7% to -5%"). While such edits 
will not be possible for CCD this year, they must be considered 
now while the edit system is being redesigned.** 

By themselves, relational edits are risky, since they 
permit the introduction of certain kinds of longitudinal biases 
into the data. If erroneous data are introduced in one year, 
and whatever caused the erroneous data is still operating in 
subsequent years, it is unlikely that the error will ever be 
caught. Because of this heavy dependence on the accuracy of 
existing data, relational edits are most often of value when 
used in conjunction with more traditional kinds of edits such 
as value or range checks and arithmetic checks. In early years 
while the relational data base is still being established, 
paired edits (i.e., relational plus conventional) will protect 
one from the longitudinal bias described above; later, it may 
be reasonable to remove many (but not all) of the non-relationa 
edits and to retain only those pairs that relate to highly 



Alternative approaches to flagging missing data are to usre 
special values (e.g., -0.0) in the data field itself. 

* 

The data file from last year is not complete and has not 
been thoroughly processed, nor will it be in time to be 
used in the editing of this year's data. 
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volatile data elements or that enhance automatic edit-check 
resolution. 

The use of paired edits could be of special value to NCES, 
since it holds the potential both for reducing the number of 
cases that require manual processing (e.g., fail a reasonably 
defined conventional check) and for enhancing the system's 
ability to resolve some edit failures automatically. These 
properties can lead to substantial savings in the time and 
resources required by the edit process. An example may help 
to clarify how a relational edit, paired with a conventional 
edit, may be very efficient. Consider two data elements 
currently included in Part VI-A: student membership and FTE- 
teachers. Currently, these elements are edited by computing 
a pupil-teacher ratio and applying a range check. How would 
one set the acceptable range for this edit? The problem is 
that this ratio can vary from as high as 40 or 50 (as in some 
elementary school :j) to as low as 1 (as in some special educa- 
tion classes). If the range for the edit is set this wide 
(i.e., 1 < ratio < 50), it will be almost worthless, since too 
many bad values will slip through; conversely, if the range edit 
is set too narrowly, too many good values will be flagged and a 
substantial amount of manual intervention will be required.* 

Suppose, however, that a relational edit were paired with 
a moderately narrow range check in the following way: if a 
field failed both checks, it would have failed edit. If, on 
the other hand, it passed either check (or both) it would be 
allowed to pass. With this arrangement, a special-education 
school might fail a range check but have the same pupil/teacher 
ratio as last year, and thus pass the paired edit. If another 



* 

Even setting the range contingent on other data element values 
(technically a "matrix" edit) , such as whether it is an 
elementary or secondary school, seems unsatis"-*.:tory for many 
cases. Legitimate ratios may vary depending o 'graphic 
and other characteristics that will never be feiit..^^.:.e to 
program into the edit system. Thus, a completely suitable 
range edit seems beyond reach. 
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special-education school was opened to normal students (e.g., 
for main streaming) , the pupil/teacher ratio would change radi- 
cally from the preceding year, but it would then be within the 
(moderately narrow) definition of normal and pass the rang^i 
check;, since passing either of the paired checks is sufficient, 
it to would pass edit. A safe variant of this procedure would 
be to pass a field which met either of a paired set of checks, 
but notify the user via a warning message, so that the decision 
could be checked by hand. Note that the default action of the 
system is to pass the record (saving human effort by having 
the system do what the human analyst would do most of the time) , 
but manual intervention is also facilitated by gathering for 
the analyst information only on those data records and fields 
which absolutely require his attention. 

Table-driven edits . Many of the edits to be used involve 
the specification of numeric parameters (e.g., "within 5%" or 
"greater than 5 and less than 40"); based on last year's 
experience with CCD edits, it would be wise to design the new 
system so that such parameters are read from an easily accessible 
table, independent of the program proper. Optimally/ this 
table would be installed on an on-line disk pack — then, if 
some initial parameter values are found to be poor (as is 
bound to be the case, since they are only best guesses at 
first) , they will be easy to modify. 

Switch-^driven edits . Just as some parameters require 
tuning, so does the entire edit system. Most systems use a 
few pieces of data in the input stream to set up basic system 
controls. For example, a "switch" in the input stream might 
be used to signal whether the data file is about to be edited 
for the first time (and therefore read in row one format) , or 
has been edited before (and thus is in intermediate, edit 
format) , or is to be output from the edit system (and thus to 
be written out in an output format) . While such gross control 
switches are familiar, it might be reasonable to consider 
using switches far more extensively to fine-tune the edit 
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system. Switches could be associated with each individual 
edit check, and stored in an on-line table for efficient main- 
tenance. Then specific edits might be turned on or off as 
appropriate. This would be especially useful for specific 
cases (states). .where systematic edit violations are known and 
unavoidable. For example, suppose that the preliminary edit 
revealed that some detail items are always blank for a given 
state, and a follow-up to the state indicates that the data 
are simply not available. In such a case, turning off the 
checks on the relevant data fields would be in order, saving 
paper otherwise wasted with nonsense edit reports, and sim- 
plifying the task of isolating true edit violations «* 

User-oriented interface . Articles and books about systems 
design commonly discuss the importance of the user interface — 
the point at which computer output and user input cross. All 
that can be done in this paper is to reemphasize the point and 
make some suggestions; NCES should keep the issue firmly in 
mind when implementing these specifications. 

Edit systems, in particular, are notorious for generating 
output measured better in pounds than lines, and for requiring 
user inputs (like updates, corrections, and overrides) to be 
in a form dictated by programming convenience. A slight 
increase in the design-and-programming investment to humanize 
the interface will result in significant savings in both cost 
and time during edit processing. The approach will also be 
effective at reducing new errors introduced as part of the edit 
process itself. 

Messages must tell the survey analyst what is wrong, 
where , and provide sufficient infonr.ation (at least) to allow 
a guess as to why . They must be arranged so that various kinds 



Of course r for fields with their own estimation indicator 
embedded in the record, the edit system should check to see 
if data are "not available" before proceeding with other 
edits . 
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of messages can be found without lengthy searching; messages 
that are defined as critical should be located separately from 
messages that are considered less pressing* The user should 
assist the system designer in figuring out how to describe 
what is wrong, and in selecting the kind of output that will 
help him determine why; the question of "where*' will usually 
require two answers, one for the user, and a second for the 
edit system. We consider it reasonable to permit this parti- 
cular piece of computer-oriented information to slip into the 
message system: a record identifier that has been uniquely 
defined by the edit system can lead to significant run-time 
efficiencies, and may even, occasionally, be the only means 
by which the user can distinguish records (as when two records 
for an LEA are received, one of which is spurious and must be 
deleted) . 

On the other side of the process, the edit system must 
provide the user with a convenient means of specifying the 
field that he wishes to change, and the value to insert. 
Including a parser in the edit-system-update module, so that 
the user can px'ovide his input in relatively free-form, is 
strongly recommended. One unsophisticated scheme that can 
save considerable effort is to arrange to handle updates in the 
following form: 

record niomber — comma — field number — comma — 
new field contents in quotes. 

All spaces, except those within the quotes demarcating the new 
field contents/ would be ignored by the parser. For example, 
"6457, 22, •gSSl*" would be interpreted by the edit system 
as "change field 22 in record 6457 to '9381'"; more importantly, 
the user can also easily interpret this form. This is clearly 
superior to any kind of fixed-format transaction record, and is 
much easier for the user to prepare. It also relieves the 
edit report of the burden of serving the secondary role of 
input form. Many systems us'-e the same report to inform the 
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user of the error and to return his or her corrections to the 
edit system, needlessly complicating the message system and 
making it more difficult for the survey analyst to use. 

To this point in the report the proposals have been 
analytical, theoretical, and general. The final two major 
sections deal more specifically with edit specifications at 
the data item level. Part VI-A is dealt with first; data- 
entry, preliminary edits, and production edits are recommended. 
A similar presentation then follows for Part VI. It should 
be noted at this point that many of the production edits are 
not original in this report but have been adapted from the 
edit system developed for the last cycle of CCD. They are 
included with the new material so as to provide a complete 
system view. 
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EDITS FOR PART VI-A ~ UNIVERSE OF PUBLIC SCHOOLS 

This survey involves the collection of 14 substantive and 
seven utility data elements on every public school in the 
nation. Data are collected from each of the 58 state and 
territorial education agencies (SEAs) , which in most cases 
collated the data elements from their existing records.* Data 
aro received from the states in three for^^' hard-copy (the 
form is reproduced in Appendix A) , shuttle- JLxsti.ag (essentially 
a form- facsimile that is computer-generated by i^CES from data 
collected during the preceding VI-A cycle, and on wliich the 
respondent indicates changes only) , and magnetic tape (tape 
format specifications are reproduced in Appendix B) . The first 
two forms must be keyed by NCES onto machine-readable media, 
at which point they are equivalent in form and format to the 
submissions received on magnetic tape. As a result of the 
multiple forms of submission, three kinds of edits must be 
considered: on-line at data-entry time, preliminary machine, 
and production edits.** 

Table 1 summarizes the edits proposed for the current 
cycle of The Public School Universe Survey. The first column 
of the table contains a list of the data fields, using the 
field name assigned by NCES (see the tape record format docu- 
mentation in Appendix B) . A field description is also included 
to assist in associating the data field with the corresponding 
survey item (Appendix A). Across from each data field name 



* 

If a data element is not collected by a given state, it will 
be missing from all Part VI-A individual school records within 
that staters jurisdiction. In such cases, NCES may attempt 
to obtain the data from another source* Negotiations with 
the state are also held to arrange for the collection of 
such data elements in the future. 

A fourth kind of editing, manual/clerical editing of hard- 
copy, is also done, but will not be dealt with here. 
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Table 1 



SnWftW OP EDITS FOR PAFU VI-A, 
PUBLIC SCHOOL UNIVERSE 



Field Nane/Deacripticn 


DAIA EOTRJf 
Edit Cliadc iWTtion 


preLMNARy NSVCMNE EDIT 
Edit C3iock Action 


PRODucnoN Eorr 
Edit Check Action 


1. NCES-nV' 

7-tyte LEA no. 


E^reaenoe, filled Reject en failure 
nineric 

1st 2 bytes nust Reject cn failure 
be vaUid QE state 
oode 


Presenoe Cbunt failures 
Filled nuneric 

1st 2 bytes Frequency distrib. 


Valid lEA id Reject on failure; write 
nmber to special hold files 
and request user action 


2. 9CHN0/ 

5-byte mmeric 
NCES sctool no. 


Presence, filled Inject on failure 
nxneric (chedc valixlity 

Else blank if field '^l) 
#21 * "TT Wtxte rtessage to 
cperator to set 
aside fonns with 
"NEW* schools. 


Preserce Count failures 
Filled nuneric 

Presenoe Crosstab with field 

m 


Presence, filled If absent & school is 

numeric "NEW", assign no. from 
Else blank if field available pool. 
#21 = "N" If present & school ia , 
"closed", retire nunter 
If absent & school ia not 
new or is closed, reject 
record, writing it to 
special noju rue, & 
write message to user 
requesting action. 


3. SEA-H}/ 

sea's id code for 
LEA (20-byte 
alphanirezric) 


Leftr-justify 


Presence Count failures 


Left- justify 


4. SYS-NAMEV 
Nane of LBA 
(30-byte alpha- 
nuncric) 


Presence Iteject on failure 
(if ra3t available 
enter "UNKNOWN" 
& ccnfirni.) 

Left-justify 


Presence Cbunt failures 


Presence Auto-oorrect on failure ti 
"UNKNCm" & write womin 
message alleging cverridfl 
update. 

Left-justi^ 


5. SEASCHU)/ 
sea's code 
for school 
{20*fcyte alpto- 
nuRcric) 


Left- justify 


Presence Count failures 


Left- justify 


6. SCH-l«ME/ 
Sdiool rutie 
(30-byte alpja- 
nuRGric) 


Preaaice Reject on failure 
(if not available, 
enter "imiowr & 
oonfinn) 

loft-juatify 


Presence Count failures 
Freciuency count 
on "UNMCMN" vs 
blank vs all 
other* 


Presence Auto-oorrect on failure 
"UlxKiXKi^" a write wamin 
message allowing o/erridfl 
update. 

Left- justify 


7. STADCEST/ 
1-lVte estima- 
tion indicator 
for field #8 


Legal code - Anto-^corroct on 
(blank or •*N*) failure, field #8 

CdnsiJtenry check dcminant, notify 
ifii^^A Ao\ operator* 


Ifigal oode FLequency oount 

Consistency chock Crosstab with pre- 
senco check on 
field «8 


I^gal oode Auto-oorrect on failure, 
(blank or '^N") field #8 ctominant; noti- 
fication to log. 

(field #8) 


8. St-PWR/ 

Street «vHresa 
(25-byte alpha- 
niXTcric) 


Presenoe Left-justify 

Ocnsistcncy check 
ageinst field «8. 

Reqaest operator 
oonfinratlon. 


Presence Oount fedlures 

Crosstab with field 
#7. 


Presence Lf^ft-justify 

(see field #7); write 
warning nessage to log. 


COTpound check 
on fleldfi «2, 
*5, «5, #3 


3iiTUltanecu8 Reject of fields 
absence 2, 5, 8 are orpty 
& 6 is enpty or 
"UNWCWN" 




SiiTultaneous Reject if fields 2, 5, 
absence & 8 are arpty & 6 is 

"UNKNOW"; write record 
to special hold file & 
request user action. 


9. CTK-ESP/ 

1-byte indicator 
for field #10 


Legal code (blank Auto^riect on 
or "N*) failure, field #8 

Oonsistency chock domijunt; notify 
(field #10) operator. 


Lagal oode Frojuency count 

Consistency check Qrosatab with pre- 
sence check cn 
field #10 


I^gal Code (blank Auto-oorrect on failure, 
or •'N") field #10 dominant; noU 
fication to log. 

Consistency check 
(field 410) 
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l^e 1 (cont'd.) 

SMlRKf CF EDITS TOR PART VI-A, 
PUBLIC SCHOOL UNIVE3SE 



Field Nane/Descxiption 

10. CIT5f/ 

City naine, 13-byte 
aJpheunzBric 


Edit QMCk A:ticn 

Presence l^ft-juatify 

Ccaisistercy check 
against field «9; 
request operator 
confixmation 


1 PREUMDCU^ MACHINE EDIT 

Edit Check Acticr 
Preaenoe Count failures 

Qrosstab with field 
#9 


PPDIXCnCN EDIT 

Sdit Check Action 
Presence Left-justify 

(see field #9) ; write 
vaming nessu.^a to log. 


11. ST-ABBRV/ 
2-)birte state 
abbreviation 


Presence Autooorrect on 

f2Lilure using 1st 
A oybcs uc rxeid 
*1 (OBoode) & 
notify operator. 

Legal ooda if pre- Reject en failure. 
9eic (U.S. Postal 
Service) 


Presft-nce Count fedlures 
Lagal code Frequency Distrib. 


Presence Auto-correct on failure; 

ge«rate value using 1st 
2 bytes of field #1 
(CE code) 

Legal code if pre- Auto-correct as above? 
s^it (U.S. Postal write worning Message 
Service) allowing override/update 
{ivi case of error in 
field a) 


12. znctEST/ 

1-tayte indica- 
tor for fieia 
#13 


Legal coda (blank ii„4.rw^«-^«^ «^ 
«M"V Auto-oorrect cn 

°^ ^ ' failure, field 
Ccjfisisbency cteck #13 donirant 

(field #13) 


Legal code Frequency Distrib. 

Consistency check Crosstab with pre- 
sence check on 
field #13 


Legal code (blank 

or "N") Auto-correct on failure, 
Consistency check field #13 doninant, 
(field #13) notification to log. 


13. Zir-CD-5/ 

S-byto ntcicric 


Presence consistanry check 
against: field fl2 

Filled nursric cn failure, reject 
or update field 
#I2 to -IT, bUnk 
field #13 


Presence Count f &ilucaa 

Filled nutirxic Crosstab with 
field #12 


Presence If filled r»neric & con- 
sictcnt with field tl2. 

Filled mroric . 

If blank & oonsistint with 
#12, pass. 

If filled or blemk £ 
ijxmsistent with 112, 
update #12, notify log, 
& pass. 

If non-blank & rot filled, 
foroe #12 to "V, #13 to 
blank, a write warning 
nessage allowing ovoztld^ 
update. 


14. SCH-TYPE 

1-byte nuncric 


Legal code Raject on failure 
("1- to "T") (operator could 
correct to blank) 


Legal code Frotjuency Distrib. 


Legal code flubo-oorroct to blank, 
("l" to "7") wetming nessage allowing 
override/update. 


15. GRD-SPAN LO/ 
irMtr limit of 

lyte alpha-- 
lunsric 
(see al«o 416) 


1^1 ooda Rsject on failure 
(TK", 

"Ol" to "12", 
"UC- blank) 


Legal code Frequency Distrib. 


lagal code Auto-correct to blank, 
(•*PK", "TC", write warning nessage 
"01" to "12", allowing override/update 
"UC", blank) 


16. QIVSPAN HI/ 
U^jper limit of 
grade span, 2- 
byta alpha-- 
numeric 


I-egal code Reject on failure 
(see #15) 

Consistency check Reject or forced 
#l^blank if #15 auto-oorract 
blank, #1^"UC" 
if #l&i-tE", else 
#16 > #15 


Legal code Frequency Distrib. 

Consistency check Crosstab with field 
#15 to verify #16 > 
#15 or both blank "* 


Ifigal code Auto-corroct to "UC" if 
(see #15) #15 « "UC" else blank & 
write vaming nessage 
allcwing update/overxide 

Auto-correct & write wim* 
ing nessage 
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T^ble 1 (oont'd.) 



Sumery of Blits far Part VI-A, 
Public School Universe 



Field ^taBn^/De^criptix3n 


Edit Check Acticn 


PFEUMIMtfSf WftTHTMR EDIT 
Edit OiBck Action 


PRDDOCnON EDIT 
Bdit Owck Action 


Ocnsistency check 
on fifllda »14, 
#15, #16 


Gbnaistency diock 
114 #15 & #16 

1 -or, TK", -1Q3*, 
"0.1" thru "08" 

2 Any 3 or 4 of 
"03" thru "09" 

2 "UC", "07" thni 
"12" 

4 Any 

5 Any 

6 -07" thru -12" 

7 Any 


Consistency check Croestab 
#14 #15 & #16 

1 nxr, "TK", -KG", 
"01" thru "08- 

2 Any 3 or 4 of 
"03" thru "09" 

3 "UC", "07" thru 
"12" 

4 Any 

5 Any 

6 "07" thru "12" 

7 Any 


Consistency check Auto-oorrect #15 & #16 
#14 #15 & #16 to blank; write weming 

1 -CK" "PK" "K3" "'^ssage allowing updats/ 
"01"'thru "08" ' c'^'^s^^^ 

2 Any 3 or 4 of 
"03" thru "09" 

3 -UC", "07" thru 
"12" 

4 Any 

5 Any 

6 "07" thru -12" 

7 Any 


17. TOJ-ESr/ 

1-byte indica- 
tor for field 
#18 


lagal code ("*", "^", Auto-oorrect to 
"N*, or blank) N if field #18 
rw>«<-*— blank, else auto- 
Cbnsistency check correct to ''^" & 

warn (anibiguity 
vs "*") 


Legal code Frequency Distrib. 

CcnFistaicy check Crosstab with pce- 

sence check cn field 
#18 


Legal code ("*", "fJ", Auto-correct to N if field 

consistency cf-eck ^^^^ ^ * 
v^«x^L«»,y ui**^ warning message 


18. TEPiCXFTE/ 
FTE of claaa- 
tocin te&chers, 
4-byte runeric 
fixed point no. 
in F4.1 fonnat 


Presence Oonsistency chedc 
agaijist #17 

Justified^ filled Cperato: nust oiter 
nuieric no. ritiht-justi- 
riffl, Iciading 
zeroes; reject 
en failure 

vrcjix*: uiLS ce 9eu up as a ?y u 
aitry to fbroe deciiml) 


Presence Ftequency Distrib. 

Range chcci\. Mean, Mauian, 5 
highest values, 
5 ICMest valuesi 

ueju cn 


Presence oonsistency check against 
#17 on failure, write 

JiBtified, fiUed y?^^^' ? 
j^f^j^c bl*mk; auto-oorrect #17 
to N 


19. McM»-E3r/ 

1-tayte indicator 
for field #20 


Legal code ("*", "JT, Autu^ouiaect to 

'V, or blaz^c) N if field #20 

rr^mim^m^ . I ■ 1 Wflnk, else auto- 
Onsistency check ^^^^^^ ^ .jj. ^ 

warn (anbiquity 

"^" V8 "•") 


Iflg?''. code Frequency Distrib. 

Ocxuistency check Crosstab with pre- 
sence check on field 
#20 


Legal code ("*", "^-, Auto-oarroct to N if fiak 
"N*, or blank) #20 blank, else auto- 


20. fCMBKSHP/ 

4-byte nixneric 
pupil owlxsi— 
ship 


Preseioe, nmeric Oonsistency check 
against #19 

Right-justify 


Preoqge Ftequency Distrib. 

Range chedc Moan, Median, 5 
highest values, 
5 lowest values 


Presence Consistency check against 
#19 on failure; write 

jxistified, filled Sfl^;uSS:s^?1i? 

ruroric blank; auto-oarcect #19 

to N 


Consistency chedc 
against fields 
#18, #20/ #20 ^13 
if both present 
pupil/ teacher 
ratio 


If #18 & #20 are pre-* 
sent 

If school type ^5 Cn failure, 
#20 f #18 is > 12.0 request verifi- 

< 35.0; cation of #18 
Else #20 T #18 & #20 

> 3.0 

1 20.0 


Ocn^ute itean, iradian, 
5 highest values. 
Range check 5 lowest values, by 
school t^^pe 


If #18 & #20 are pre- write warning nesssqs on 
sent failure to allow updata/ 

If scSwol type 5 override; leave fields 
#20 ^ #18 is > 12.0 #18 & #20 unchanged* 
< 35.0; 

Else, emit check 


21. 

l-byta alpha- 
nuisric field 
to indicate 
flchODl status 
(new, clooed, 
existing) 


I^gal code Reject on failure 
(-N", "C", blank) 

Qansistoncy check See field #2 


Legal code Frequency Distrib. 

Consistocy check Crosstab with pre- 
sence on field #2 


Teiyil code Auto-corxect to blank if 
(-N-, "C", blank) field #2 has legal mtry 
else reject; wzite to 
special hold file & 
request \iser acticn 
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and description are three pairs of columns, one pair corres- 
ponding to each kind of edit treated herein: on-line at data- 
entry, preliminary machine, and production. The left-hand 
column of each pair describes the condition to be screened for 
(e.g., "presence" — is something present in the field?), while 
the right-hand column generally indicates the action to be 
taken if the eclit is failed (e.g., "if nothing is present in 
the field — if it is blank — the record should be rejected"). 

Because terminology is not well standardized in this area, 
it will be useful to discuss the proposed edits in some detail. 
In order to organize the presentation, we will treat each phase 
of editing as an entity, going down the pairs of columns of 
Table 1, one at a time. Later we wil.l discuss some implications 
that are only apparent when the edits are examined across 
phases • 

Data Entry Edits 

The first data field contains the NCES numeric identifica- 
tion number assigned (for the most part) uniquely to every LEA 
in the country. it is a fundamental identifier in NCES's data 
collection system, and may be cross-referenced to other data 
bases, including those of the Bureau of the Census. It is a 
critical data item, and is absolutely required. The edit 
criteria at data entry are stringent — the field must have 
data ("presence"), and the data must be, precisely, a 7-digit 
number, since all LEA numbers are of this form. Further, the 
first two digits (left-most) must correspond to a valid Office 
of Education (OE) code for one of the 50 states, the District 
of Columbia, or 7 territories. This means that the first two 
digits must be a niimber between 10 and 69. If the record 
fails to meet any of these criteria, the data-entry system 
must reject the record. In practice this raeans that the key. 
operator is notified that the record has been rejected, but 
the record remains on the CRT screen. The operator would 
then visually verify the keyed record to determine if a keying 
error had been made. If so, the operator would correct the 
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field "on-line" and resubmit it to the system. If no keying 
error had been made^ and the entry in the raw form was in fact 
not legale then the operator would be instructed to set the 
form aside^ and to pass it on at the appropriate time to an 
NCES analyst. (This analyst would then have to resolve the 
problem before it was ever entered into the system.) Having 
set the form aside^ the operator would proceed to key subse- 
quent records. 

Another edit could be included for this fields but its 
value is questionable in a real-time system. The system could 
check the entire 7 -digit number agaij::*,^t a table of legal LEA 
codes^ but this table contains approximate Iv il6 000 entries, 
organised into only 58 groups^ and thus the ch^ck wouxJ be 
fairly costly in computer resources. Or the ofefe^^r hand^ this 
check might be very cost-effective in a batch ^^it envir rHnen- 
(see Production Edit column for field #1). 

Field #2 is the NCES school nvuaber^ which is to be a 
unique identifier for each school in a given state (i.e., it 
must be combined with the two-digit state code to be unique 
nationally) . Edits are specified for this field in Table 1, 
since ntimbers were assigned for many stai^es durim"^ the 1977-78 
tryout of this survey. However, SAGE has reconunex.ded new 
permanent numbers be assigned this year, in which case this 
edit would be skipped at data-entry.* If the field is to be 
edited, the check would be similar to that for the first field 
Most legal entries should consist of 5-digit numeric entries; 
in addition, the field could legitimately contain all zeroes 
or blanks, but only if field #21 (NEWCLOSD") indicates that 
it is a new school. Any time field #2 contained other than 
ntimbers or blanks, or any time it failed the ntimeric-presence 
check and field #21 did not indicate a new school, the record 



* 

Fingerman, P. W. Letter report to Mr. Warren Hughes, Institu 
tional Surveys Branch, Division of Elementary and Secondary 
Education Statistics, NCES. 7 March 197 9. 



would be rejected. In addition, it is reco.-nmended that when 
field #2 contains all zeroes or blanks and field #21 indicates 
-he school is new (i.e., legal entries signalling a new school), 
the record be validated'!^ The nui^nber of new schools is rela- 
tively small, and their verification is reasonable compared to 
the risk of fouling the identification scheme for such an 
important longitudinal data base.* 

The third field contains the state education agency's 
(SEA) code for the LEA, if any. It is an optional field main- 
tained for the convenience of the states, should they request 
the data base in the future, or should they request a shuttle- 
list for future responses. The notation, " left- justify" in 
the action column means that leading blanks, if any, should 
be stripped off by the data-entry program. 

The fourth field is reserved for the name of the LEA. 
Table 1 indicates a presence check: reject the record if 
there is nothing in this field, and perhaps the data entry 
operator to enter a place holder, "UNKNOWN", to fill the field. 
This field is also to be left- justified. 

The fifth field is meant to contain the SEA's code (if 
any) for the school. It is treated just as the third field is. 

The sixth field is for school name, and an entry is 
required. Thus, if present, the operator keys the name, and 
the program left- justifies it. If no name is present, the 
program substitutes "UNKNOWN" and asks the operator to confirm 
by checking the raw form. 

The next pair of fields are linked: field #7 is an indi- 
cator field for field #8, street address of the school. Field 
#7 must be blank unless the street address is not available. 



This field could be afforded considerable additional protec- 
tion were a check-digit added. The logic for handling check- 
digits is automatic in most commercial data entry systems. 
This topic is examined again in the context of the production 
edit system. 
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in which case it should contain "N" (for "not available"). 
This is set up primarily for tape submissions of data, to 
verify that the address is not available in state records, as 
distinct from accidentally omitted or missing. Field #8 is 
left- justified, and should be checked to see if something has 
been entered. If so, field #7 is expected to be a blank. If 
there is a street address and the indicator is an "N", field 
#7 should be au+-omatically corrected to a blank (i.e., field 
#8 is "dominant" over field #7), and the key operator should 
be notified. This notification would provide an opportunity 
for the operator to correct the situation if the entry in field 
#8 were accidental, and field #7 correctly contained an "N". 
If there is no entry under street address, and field #7 contains 
a blank, the system converts this blank and requests operator 
confirmation . 

The next edit involves a check for simultaneous absence 
of all identification information regarding the school. Thus, 
if a record has no NCES school number, no state school number, 
no school name (or "UNKNOWN"), and no address, then it is not 
an acceptable record. In other words, some information is 
required which distinguishes the school from others in the 
same LEA. 

The ninth and tenth fields are paired in the same way as 
#7 and #8. Number 9 is an indicator for #10, city name. The 
adit checks are analogous to those for #7 and #8 respectively. 

Field #11 is a two-character data item containing the U.S. 
Postal Service's abbreviation for the state. A check for 
presence is made and, if the field is blank, the system supplies 
a state abbreviation using the two-digit OE state code from 
field #1* A check is also made to verify that the entry is 
a "legal code," i.e.^ a legitimate abbreviation for one of 
the 58 reporting units. If the entry is not legal, the record 
is rejected. The operator may then correct the entry if 
possible, or put it aside for later handling by an NCES 
analyst. 
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The next two fields, #12 and #13, are again an indicator- 
data field pair, and the consistency of the relationship must 
be checked as between fields #7 and #8. in addition, field 
#13 is to contain a zip code, and thus the five digits must 
be filled. 

Field #14 is a one-digit item indicating type of school. 
It can take on seven legal values ("1" to "7"); if it is not 
one of these the record is rejected. Fields #15 and #16 
contain data on grade span, and are subject to the same kind 
of legal code check. Field #16 Is further required to be con- 
sistent with #15. The next en^.xy in Table 1 is a matrix edit 
for fields #14, #15, and #16, designed to check the consistency 
of the information in these three fields., 

Field #17 is an indicator field for field #18, which is 
to contain data on the number of full-time equivalent (FTE) 
classroom teachers in the school. Field #17 is checked for 
a legal code, and for consistency with #18 (and automatically 
corrected to correspond to #18, notifying the ooerator if such 
correction is necessary) . According to the instructions 
(Appendix B) , the number entered in #18 is to the nearest 
tenth, but no decimal point is keyed. This introduces the 
possibility of an incorrect entry due to mispositioning the 
value in the field. For example, if the correct value were 
101*1, and it were miskeyed as " lOl", it would be interpreted 
as 10.1; if the correct value were 10.0 and it were miskeyed 
as "100 ", it could be interpreted at 100.0.* Thus the: edit 
requirement that the operator enter the value right-justif ied 
and zero-filled is recommended to reduce keying errors. An 
alternative would be to set up the data entry program to accept 
this field as two entries, an integer part and a fraction 
part, and to require the fraction part be entered (i.e., a 
required field) . In any event, the point is to prevent 
positional errors in entering this item. 



* 

Keying " 101" means, literally, keying blank-one- zero-one. 
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Fields #19 and #2 0 are another indicator pair; once again 
they must be checked for consistency. In addition, #20 is a 
niameric field, which must contain only numbers or leading blanks. 

If both fields 18 and 20 contain data, a further edit is 
possible: a matrix range check on pupil/teacher ratio. As 
noted above, this check is not optimal, but is often the best 
alternative when no relational checks are available (see below) . 

The last field is used to indicate whether the school is 
new, has closed during the past year, or continues to operate. 
Three codes are legal, and are checked for; the consistency 
check between this field and #2 (NCES school number) has pre- 
viously been discussed. It may be possible in the future, 
in cases of disagreement between this field and #2, to verify 
the school characteristics (e.g., name) against values from the 
preceding year to resolve the discrepancy (see relational 
edits below) . 

Three final points should be emphasized. First, data 
entry operators should be clearly instructed on what to do when 
a record fails an edit. Generally, they should be told to check 
to make sure no keying error has been made. If such an error 
is discovered, the data-entry program should permit them to make 
a correction and then to edit the record again. if the error 
is contained in the source document, the document should be put 
aside for review by subject matter analysts (e.g., the survey 
sponsor and staff). Second, the edit program must be flexible. 
Edits must be adjustable, especially range-checks (see table- 
driven edits, above). The program should permit edits to be 
over-ridden or "soft-failed" when they are advisory in nature, 
or when a proportion of correct records is expected to fail a 
given range-check edit (e.g., pupil/teacher ratios). if these 
also fail a production edit, you will know that the failure 
was not due to a keying error (since it failed and was checked 
at entry time). Finally, the data-entry program should allow 
the edits to be turned off. One example of when this is use- 
ful is when input is to be keyed and verified. Since edit 



checks are performed at entry time, it is a waste of on-line 
resources to continue to perform edit checks again during 
verification. Only records which are changed during verifica- 
tion require re-editing, and the program should be set to turn 
edits back on automatically in this case. 

The final point has to do v/ith relational/longitudinal 
edits and on-line data entry. Data base access methods and 
state-of-the-art computer technology make such edits feasible 
in an interactive environment, although somewhat expensive. 
If NCES expects to continue to receive a substantial propor- 
tion of data in hard-^copy^ then a move to add such edits will 
probably be justified. Whether performed at entry or held 
until batch, the setting of relational edit parameters, i.e., 
how much deviation from last year's value should be tolerated 
before the comparison is considered suspicious, depends on the 
.lability of the data item. Later -this year SAGE plans to 
conduct empirical studies on the longitudinal behavior of some 
CCD data items in order to lay the groundwork for relational 
editing in the next cycle.* - * 

Preliminary Machine Edits 

The preliminary machine edits presented here were developed 
with two factors in mind: first, since they are the first 
edits for data submitted in machine-readable form, they must 
be sensitive to many of the same errors which drove the design 
of the data entry edits. Second, the primary purpose of the 
preliminary edit is to discover statistically systematic edit 
problems which might be fixed programmatically as, for example, 
when a state frequently but consistently uses an alternative 
code value (e.g., "K " for "KG"). Such errors can often be 
fixed prior to the production edit with a simple reformatting 
or transformation program. 



As mentioned previously, such edits are nou possible this year. 



For these reasons, preliminary machine edits will often 
be found to be less thorough but partially redundant with data 
entry edits. Nevertheless, it is recommended that keyed data 
be passed through the preliminary machine edit program. Often 
a bias may thus be revealed that was not apparent looking at 
records one at a time during data entry. For example, suppose 
that a state accidentally ignored the "NEWCLOSD" field, and 
failed to report schools that were opened or closed since the 
last survey. The data-entry system would catch the new schools 
(no school number in field #2, and no indication of new in the 
"NEWCLOSD" f ield) via a consistency check; such records would 
be rejected, and the operator would set aside the form. However 
the absence of closed schools would not be caught by the entry 
system, and the operator is not likeily to notice either. A 
frecjuency distribution on the "NEWCLOSD" field produced by the 
preliminary machine edit program, however, would quickly reveal 
the complete absence of closings (and, if run before the analyst 
reviewed and corrected . the set-aside new schools, the absence 
of openings). If this were not sufficiently suspicious, a 
quick comparison with the number of school status changes 
reported last year by this state might settle the matter. 

The specific preliminary jnachine edits proposed have been 
coded into an SAS program by SAGE. They are also listed in 
the middle pair of columns in Table 1. No field-by-field 
treatment is necessary here since these edit checks are essen- 
tially a subset of those described above for data entry. 
However, the "actions" shown are entirely different, correspond- 
ing to the statistical purpose of these edits. The actions 
of this edit are various kinds of counting, and the outputs 
are generally frequency distributions (either one-way, or 
cross tabulations) . In addition, ranges of numeric items are 
determined. Finally, alphabetic characters in numeric fields 
are scanned for, and a. detailed report is provided for (up to) 
an arbitrary number of such errors (the default is 50, mostly 
to save paper) . If more than the default number of illegal 
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numeric conversions are found, they will be counted, and a 
field-by-field suiranary printed. 

Despite the thoroughness of the screening provided by 
this program, the full output will rarely exceed 20 pages, 
making it practical to determine the overall quality of a 
newly submitted magnetic tape almost immediately upon receipt. 
Follow-up when problems are revealed is facilitated since the 
analyst at the state level will still have the project fresh 
in mind. Also, the rapidity of the screening provides a 
maximum amount of time for writing special-purpose programs to 
fix any systematic problems uncovered. 

While no relational or longitudinal edits are included 
in the program code prepared this year, it would be simple to 
add such checks using SAS. In fact, experimental versions of 
this kind of code will be derived during the empirical studies 
.of longitudinal editing proposed for SAGE later this year. 

Production Edits 

In many respects, the production edit system is an exten- 
sion of the data-entry edit system, executed on large batches 
of records in one run. Many of the same edits are included; 
in fact, the version of the production system recommended for 
this year extends only slightly the protection afforded by 
data entry checks. The system is necessary this year neverthe- 
less, and will become increasingly valuable in the future. 
First, it is required since a substantial portion of the data 
will be received on magnetic tape, thereby not being amenable 
to NCES screening at data entry. The production edits also 
provide an opportunity to check entries that have been corrected 
subsequent to initial data entry (e.g., during verification). 
Perhaps most importantly, more sophisticated methods of 
automatic correction of suspicious and erroneous data items 
can be employed, because of the nature of the interaction 
between the computer edit system and the subject matter analyst. 
The production system should be permitted to make a great 
number of changes in flagged data automatically, programmed 
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according to the available experience and provided with a 
series of "best guesses". This is done in full knowledge that 
all of its decisions are subject to review and that its message 
system is arranged to call our attention to those changes about 
which we are least confident. 

The batch system is specified to add a few edit checks 
that were considered optional at data entry time, primarily 
because of the additional computing resources required. 
Resources are often available in background tasks at lower 
cost than in an interactive foreground environment. Finally, 
this system will be the future locus for an extensive set of 
relational/longitudinal edits. 

Because of the similarity betv/een the data entry and pror 
duction edits ,r only salient differences will be discussed in 
detail here. The reader is referred again to Table 1, and 
particularly to the first (data entry) and third (production) 
pairs of columns on the table. The first field represents a 
situation in which a more stringent check is proposed for the 
production system. At data entry time a check on the pattern 
of the NCES LEA identifier was recommended, guaranteeing the 
presence of a seven-digit number which (at le£.st) could be a 
legal LEA code. In the production edit it is proposed that 
the LEA identifier be verified against a list of legal values. 
A further, optional check (not in Table 1) would be to verify 
both the LEA identifier and the LEA name against a list of 
legal entries.* 



Special methods are required for matching records on alpha- 
numeric fields like LEA name if one is to avoid too many false 
non-match occurrences. One technique is to strip out vowels, 
blanks, and special characters from two fields to be compared 
before testing for a match. However, this much protection may 
not be necessary at present. The clerical screening of hard- 
copy survey material affords one opportunity of checking the 
validity of this identifier, and manual spot-checks on machine- 
readable submissions may be more efficient than including such 
an edit for magnetic tape files on a routine basis. Such spot- 
checks could be included ac> part of the preliminary edit pro- 
gram, implemented by having that progrcim print a sample list of 
LEA nximbers from the files which could then be validated by. 
clerical staff. . m - 
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The second field, school number, is perhaps the most 
troublesome. While LEA numbers have been in use for some time, 
and are widely familiar to both respondents and data users, 
permanent school numbers are an entirely new form of identifi- 
cation whiah may require some time before they are well 
established. If the SAGE's recommendation is followed, and 
new permanent numbers are assigned this year, little editing 
is required on this field (e.g., a check for presence and 
numeric fill). m future cycles a variety of relational edits 
would be imposed on this field, including a check of reported 
number and repprted name for consistency against the preceding 
cycle. No parallel edit is feasible this year if new numbers 
are not assigned.* 

The production edit program has an additional responsi- 
bility with regard to the school-number field: it controls 
the status of such numbers, assigning new ones to new schools, 
and retiring numbers assigned to schools which close. For this 
function it must access and maintain a school-niimber file which 
indicates the active, available, and retired niimbers on a state- 
by-state basis. The program should also maintain an audit 
trail of all activity against this file for historical purposes.** 
The reader's attention is directed to Table 1 for a description 
of routine control procedures. 



*Despite the fact that the survey content has been frozen 
through 1981, serious consideration should be given to 
modifying the hard-copy, shuttle-list, and tape formats, in 
order to extend the length of the school number field by one 
byte. This byte would be used to hold a modulus-11 check- 
digit for the school number field, increasing the integrity 
of this field considerably. This change in format could be 
made next year, using school numbers generated this year 
v?hich contain check-digits from the start. Only the shuttle- 
list states would be impacted upon at all, and even those 
only slightly. 

**Provision must be made for manual modifications to this 

system of school numbering, either via a special entry point 
in the production edit program, or through the use of a 
coordinated file maintensoice program. 
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Mention was made above regarding the interaction between 
the operation of the production edit system and the subject 
matter analyst. Several entries in Table 1 illustrate this. 
One such kind of interaction is indicated for fields which 
indicate/ under the Action column , "reject on failure; write 
to special hold file and request user action." These fatal 
errors are associated with so-called required fields, fields 
whose content is considered absolutely vital (e.g.^ #1, #2). 
The action entry indicates the writing of the input record to 
a special file rather than to the normal, intermediate edit 
file. The record is held there until the subject matter analyst 
(the "user") takes some action. Generally, this message is 
used to signal an edit failure that can only be remedied by 
human action. A message is written to a special output file 
(see below) indicating the serious nature of the problem, and 
as much information about the record as is available .(e.g., dump 
of all relevant data items and identifiers). 

Other examples of edit/analyst interaction are illustrated 
by the entries "write notification to the log" (e.g., field 
#7), "write v/arning message", and "write warning message allow- 
ing correction/update" (fields #8, #11). The edit program 
uses several different output files for messages: fatal errors 
("reject on failure" above) will be written to one file, and 
the analyst must respond. Less critical errors (those associated 
with warning messages) are written to a second file, and will 
usually report that some default correction has been employed 
that should ba reviawed for appropriateness. Finally, some 
messages are only notifications, for audit purposes, of fairly 
safe actions that the edit system has taken. These should be 
carefully examined early in the. life of the system to ensure 
that they are, indeed, safe; later they will only require 
occasional scanning,- and perhaps statistical treatment to 
analyze error behavior among respondents. 

As indicated above, it is the production edit system that 
is most likely to execute most of the relational edits when 
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they are introduced- In addition to the value of such edits 
in checking identification fields like LEA number, their appli- 
cation to non-identification numeric fields is also important. 
The use of a relational check paired with a matrix range check 
for pupil/teacher ratio was discussed earlier in this report 
and need not be repeated. Planned work by SAGE will lay the 
groundwork for using such edits in future cycles. They are 
mentioned again here, in the context of proposed edit specifi- 
cations, primarily to remind computer analysts who translate 
these specifications to keep such modifications in mind during 
system design. 
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EDITS FOR PART VI — LEA NON-FISCAL SURVEY 

This survey collects data on a large number of quantitative 
variables .for each of the approximately 16, 000 LEAs in the 
country.' ^These data, consisting of tabulations of staff by sex 
and function, of students by grade, and type of school, and of 
several other miscellaneous items, are collected from each^of 
the 58 SEAs, just as the school-level data (Part VI-A) are. 
As is the case with the school universe sxirvey, data are for the 
most part collated from existing records by each SEA, and are 
returned to NCES on hard-copy (the form is reproduced in 
Appendix C) or magnetic tape (tape format specifications are 
reproduced in Appendix D) . Thus, three forms of edit are again 
called for: on-line edit at data entry time to guard against 
keying error, preliminary machine editing for the detection of 
systematic errors, and production editing of batched records 
prior to the final release of the data for analysis. 

The survey data can be subdivided into two major classes, 
identification and utility items (e.g., LEA name, address, 
"new-closed" indicator) and enumerated data items (items 1 
through 9 of the form) . The edits recommended for the former 
class are very similar to those performed for analogous com- 
ponents of the Part VI-A survey, and are sximmarized in Table 2. 
Four of these items are edited using checks identical to those 
recommended for Part VI-A; three (street address, city, and 
zip code) differ only in that the Part VI survey includes no 
indicator field to accompany them, and so no consistency check 
is required in Table 2. The :.ast item in both surveys, 
"NEWCLOSD", is subjected to different edits in Part VI. In 
this survey the item is used to indicate the consolidation 
("closing") or division ("opening") of LEAs themsel\;es, and 
is subjected only to a check for legal code. However, because 
of the relative rarity of a change in status for an LEA, a 
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Table 2 



Sunmary of Edits for Part VI, LEA Non-Fiscal Report; 
Identification Fields 



I Name/Description 


DATA ENTRf 
Edit Check Action 


PHELIMItmPY >WCHIME EDIT 
Edit Check Action 


PRODUCTION EDIT 
Edit Check Action 


'-byte numeric 
LEA nunter 


Presence, filled Reject on failure 
numeric 

1st 2 bytes nust be Reject on failure 
valid OE state code 


Presence, filled Count failures 
niineric (blanks, zeroes) 

1st 2 bytes Frequency Distrib. 


Valid IZk iu nunber Reject on failure; wri- 
te special hold file i 
request user action 


JYS-NAME/ 
bune of LEA (30- 
byte alpha- 
numeric) 


irx,c3c*Xa^ fv;jtK7u.uti £cm.ur6 
(if not avail- 
able, enter 
"UNKNOWN" & con- 
Left- justify 


Presence Count failures 

("UNKNOWN", blank) 


Presence Auto-correct on failur< 
• to "IMQXm" & write 
warning message alla^ 
override/update 

Left- justify 


Itreet address 
(13-byte alpha- 
nuneric) 


Left-justify 


Presence Count fciilures 
(blank) 


Left-justify 


3TY/ 

lity name (13- 
hyte alpha- 
nuneric) 


Left- justify 


Presence Count fciilures 
(blank) 


Left- justify 


rP-ABBRV/ 
itate ahbrevxa- 
ticn (2-bytes) 


Presence Auto-correct on 

failure using 1st 

& uyusa vJi. Lis Id 

#1 {OE code) & 
notify cperator 

Legal code (U.S. Reject cn failure 
Postal Service) 


Presence Count failures 
(blank) 

Legal code Frequency Distrib. 


Presence Aato-correct on failur* 
generate using 1st 2 
bytes of field #1 (OE 
oode) 

Legal code (U.S. Auto-oorrect as above; 
Postal Service) write warning message 
allowing override/upd 


l-tyte nureric 


Filled numeric or Reject on failure 
blank 


Presence, filled Count failures 
numeric (blanks, zeroes) 


Prei;.i»ice, filled Auto-correct to blank i 
numeric or blank failure; write notifi' 
tion to log. 


CA-ID 

CA id f or I£A 
(20-byte alpha- 
numeric) 


Left-justify 


Presence Count fciilures 
(blank) 


Left-justify 


lENCXCSD/ 

£A status (1- 

niireric) 


LegaU. code Reject on failure 
(blank, "N", "C") 


Legal code Frequency Distrib. 


(blank. Write warning message 
"N**, "C") user on any non-blank 
entry 
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warning message should be written whenever the field is non- 
blank; note also that new LEAs without NCES identification 
nximbers are written onto a special hold file during the produc- 
tion edit phase so that an identification number can be assigned 

Enumerated data items in the Non-Fiscal Survey include 
123 fields. Associated with each of these numeric fields is 
an indicator/estimator field which may take on one of four 
codes: blank or zero when the number in the accompanying data 
field is an active value, when the datum in the field 

is an estimate, and "N" when the datum in the field is not 
available. In this last case, the data field itself is to 
contain all zeroes (an NCES specification). Thus, the first 
step in editing these numeric data items is to check the con- 
sistency of each data field with its associated indicator/ 
estimator field. At data entry time, the following rules 
should be applied: 

• If the indicator/estimator is blank, zero, or 

and the data field contains a non-zero niimeric value, 
allow the field to pass edit; 

• If the indicator/estimator is blank, zero, or 

and the data field contains the quantity zero, request 
operator confirmation in order to pass th-3 field ♦ 
(thus trapping for missing data when no "N" was keyed 
into the indicator/estimator field) ; 

• If the indicator/estimator is blank, zero, or 

and the data field contains non-numeric data or is 
blank, reject the record (the operator, of course, 
may correct and resubmit it if the problem is a keying 
error) ; 

• If the indicator/estimator contains an "N" and the 
data field contains the quantity zero, allow the field 
to pass edit; 

• If the indicator/estimator contains an"N" and the data 
field is other than zero-filled, reject the record; 

• If the indicator/estimator contains any code other 
than a blank, zero, or "N'', reject the record. 

Finally, for any record that passes this edit, the data-entry 
system should right- justify and zero-fill the data field. 
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The preliminary machine edit for these f \elds should be a 
simple cross-tabulation of indicator/estimator code by the 
three relevant field conditions (nximeric-zero^ numeric-non-zero^ 
or non-nvimeric) for each of the 123 items. The production edit 
program should apply the same standards as those applied at 
data entry.'* 

Once the data have passed this initial set of consistency 
checks, attention must turn to the consistency among data fields, 
(e.g., the checking of arithmetic, the reasonableness of the 
values) . This is the primary means of verifying that the 
numbers themselves are accurate. Some very powerful consistency 
checks are available for the first item on the survey (staff); 
somewhat less powerful checks are available for the remaining 
items 2 through 9. Because these latter checks are easier to 
describe^ we shall deal with them first. 

The second item on the survey is an enximeration of member- 
ship by grade and type of school (elementary and secondary) . 
Assuming that data are available, the sum of the elementary 
school membership fields (2A, 2B, 2C, 2D, 2E, 2F, 2G, 2H, 2Ia, 
2Ja, and 20a) should equal the elementary-total field (2Pa) . 
Upon data entry ^ the following procedure should be applied: 
if the sum is equal to the total field, pass the item; if the 
sum is within + 5% of the total field but not equal to it, 
auto-correct the total field to equal the computed sum, and 
ask the operator for confirmation; if the sum deviates more 
than + 5%, reject t'he record.** The same procedure should be 



If the preliminary edit reveals that some consistent problem 
regarding the use of these codes exists in a staters magnetic 
tape submission, one might turn off these consistency edits 
instead of writing a special transformation/translation pro- 
grai^. The real purpose of the Indicator/estimator fields 
is the detection of estimated data, but this purpose should 
probably be held subordinate to the goal of processing the 
data in a timely fashion. 

** 

A better standard than "+ 5%" would be "+ 5% or within + 2, 
whichever is larger." This permits smalT totals with small 
errors to be corrected automatically. The discussion above 
is limited to "+ 5%" only as a matter of convenience. 

44 



applied during production edits, except that when a 5% deviation- 
is auto-corrected, a notification should be written to the log, 
and wheii a deviation in excess of 5% is found, auto-correction 
should be applied (total set equc^l to computed sum) and a 
warning message sent to the user. All of these auto-corrections 
must be applied with caution: if the edit is performed when 
the individual grade enrollments are missing (indicators are 
"N"), the data fields will sum to zero (since they are always 
zero-filled) . Thus, this check-sum-and-auto-correct procedure 
must only be applied when all requisite data are present. 

The same kind of data entry and production edits are called 
for in dealing with the secondary school membership fields 
(sum of fields 2Ib, 2Jb, 2K, 2L, 2M, 2N, and 20b should equal 
the total field, 2Pb) . In addition, entries for seventh-grade 
elementary and seventh-grade secondary (2Ia and 2Ib) should be 
checked against one another; only one should have data (for 
most school districts). If both have non-zero values, the 
data entry program should request operator confirmation, and 
the production edit program should write a notification to the 
log. The same kind of check should be applied to the two eighth- 
grade fields (2Ja and 2Jb) . 

The preliminary machine edit program should print the 
five highest and lowest entries and the mean for each field in 
item 2. In addition, the elementary and secondary school 
membership sxams should be computed, the appropriate total 
fields subtracted, and the difference evaluated for each LEA: 
the five highest and five lowest differences and the associ- 
ated totals should be printed, along with the mean differences 
between computed sums and reported totals. Finally, frequency 
distributions on the associated indicator/estimator fields 
should be printed. 

Items 3, 4, and 5 are all similar in content, and should 
be subjected to the same arithmetic check: providing the data 
are available, the siam of the first two columns (e.g., 3A and 
3B) should be equal to the toti^l column (e.g., 3C) . When it 



is not, but is within + 5%, the data entry program should set 
the total to the computed sum and request operator confirmation, 
while the production edit program should make the same correction 
ar^d notify the log; if the deviation is greater than 5%, the 
data entry program should reject the record, while the produc- 
tion program should auto-correct and write a warning message. 

In addition to this. .common arithmetic check, one additional 
check should be applied to item 3 alone. Item 3C should be 
non-zero if students are reported in the twelfth grade (2Pb) 
and both data values are present. Write a warning message on 
failure (data entry and production edit) . 

The preliminary machine edit program should compute the 
five highest and five lowest values and the mean for: the 
data fields, the computed sum for each item, and the differences 
between the reported and computed totals. The preliminary edit 
program should also produce frequency distributions for each of 
the relevant indicator/estimator fields; and a crosstab presence 
check between fields 3C and 2Pb. 

For items 6, 7, 8, and 9 no edit checks of any value at 
the individual record level are possible, since there is little 
information internal to the sur\^ey against which to check them; 
powerful edits on these fields will have to await the next 
survey cycle, when relational edits become available.* One 
step should be taken this year to try and deal at least with 
extreme instances of error: the preliminary machine edit pro- 
gram should produce a frequency distribution of the relevant 
indicator/estimator fields, as well as the 20 highest and • 
lowest values, the mean, and the standard deviation for each 
of the data fields. 

Returning to item 1, much more powerful edit checks may 
be applied to these data, and more sophisticated methods of 
auto-correction for errors are available. This results from 



Item 8 could be checked against the school universe file this 
year, but this would probably not be cost- justified. 
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the relative redundancy built into the item, the number of 
interlocked detail, sub-total and total cells that mutually 
constrain the values which each may take on. The approach still 
depends on the requirement that a reported sum of tv/o or more 
fields should be equal to the. computed sum; the factor which 
distinguishes item 1 is that every data field is involved in 
at least two such arithmetic comparisons. This acts as a con- 
straint, and guarantees the identification and accurate correction 
of any record with one error among the item 1 fields, and a 
high probability of accurate detection and correction of records 
with two or more errors. Before laying out the complete system, 
a simple example will be used to illustrate the method. 

Figure 2a depicts a portion of item 1 with illustra- 
tive data. It includes three of the thirty rows of item 1, 
corresponding to part M. Each row is divided (as is the rest 
of item 1 — see Appendix C) into three coliomns, "male", 
"female", and "total". The illustrative data add correctly, 
both across rows (job total and grand total for the four detail 
fields) and down columns (sex total, and grand total for the 
four detail fields) . For the examples below, ass;ame that these 
are the "true" data, which might be erroneous v'hen the data 
record is ultimately received by NCES. Consider now Figure 2b. 
When the addition checks are performed on this array of data, 
we find that the third row does not add up, marked by the arrow 
to the. right of the table, and the second coltimn does not add 
up, indicated by the arrow beneath the table. One cell entry 
has been transcribed in error, and it can easily be seen that 
this cell is in the third row, second column. Further, the 
arrows "pointing" to the bad row and the bad column intersect 
at the cell containing the bad datxim. Moreover, only one value 
can replace the bad dattim and satisfy both of the addition 
check failures : replacing the "7" in the critical cell with 
the correct value, "6", satisfies both additions. Figure 2c 
illustrates this phenomenon for a similar situation, where 
one cell contains an error, and this cell is located in the 
single row and the single column that fail to add properly. 



As s 1 gntnen t/ tunc t i on 





rema i « 


^3) 


(b) 



foca I 

(c) 



a) correct: 



b) one 
error: 



c) one 
error: 



d) two 
errors: 



e) two 
errors: 



M. Aid«s 








1* I truct fona I aides 


1 


2 


3 


2. Ocher ^ides 


3 


4 


7 


3 . TOTAL fM f ind 21 


4 


6 


10 








M. Aides' 








1. tnscrueciona) afdes 


1 


i 2 


i 3 


Z. Other »ides 


1 3 


4 


7 


3 . TOTAL (M t 4fHt 71 


4 


7 


10 






+ 




M. Aides 








1. InscruccFonaf aides 


1 


3 


3 


Z. Other a-idas j 


3 


4 


7 


3 . TOTAL ; tn^ 2i 


4 ! 


6 


10 


+ 


M. Aides l 






I . i nstnictfonal a ides 


1 


2 I 


4 


Z. Other aides | 


3 


4 


8 


3 . TOTAL fA0 f iMS 2i 


4 1 


6 


10 




+ 


.M. Aides 




I. Instructional aides 


1 


i 5 


I 3 


Z. Other iides 


6 


4 


7 


3 . TOTAL (A4 t ina2i 


4 


6 


10 




+ 


+ 





Figure 2. Auto-Corr.ection Method for Item 1 - Part VI 
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In fact, it can be proven that, for any table with r rows 
and c coliimns where one of the rows is generated by summing the 
other rows and one of the columns is generated by summing the 
other columns, a change in any single value in the table will 
lead to exactly two addition check failures, one for a row and 
one for a column, and that the cell at the intersection of this 
row and colvmin must be the cell that is inconsistent. Thus, 
for any table with exactly one error it is possible to locate 

correct the error with perfect confidence. Conversely, 
any table that has more than one row or more than one 
column that do not add up correctly contains at least two 
bad cell entries. 

What about tables with exactly two. errors? Here the 
situation is more complex. If the two errors are located in 
the same row or column, the cells can be identified and unique 
corrections are computable (Figure 2d); the key to identifying 
this situation is noting that there are exactly two rows and 
no more than one coliimn, or two columns and no more than one 
row that fail to add properly. Two errors that are not in 
the same row or column are not uniquely identifiable or correct- 
able; such situations may be distinguished from those above by 
the combination of two rows and two columns which fail the 
addition check. With more than two errors, not even the kind 
of error (e.g., one cell, two cells in same row or column, two 
cells in different rows and columns) may be identified. 

The point is that we have the capability not only to iden- 
tify errors (by addition check edits) but also, in a substan- 
tial number of cases, to correct them automatically with a high 
degree of confidence in the accuracy of the correction. How 
then might this system be applied to all of item 1? Begin by 
noting that item 1 may be partitioned into four sub-sections or 
partitions which themselves, meet the criteria spelled out above 
(i.e., £ by c tables with a sum-column and a sum-row) . Figure 3 
illustrates these partitions, outlined with heavy lines. One 
further partition in Figure 3 includes the "total" rows from 
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I. Full-cjme equivalent number of persons employed by chis agency during che payroll 
period including October I, 1978. 

REPORT TO THE NEAREST TENTH 



* 
* 



Ass iqnmen c/ f unc c Ion 


Male 

(a) 


Femai e 

(b) 


Tota 1 

(c) 


A. Suoerincendencs 








if 

8. Other officials/administrators 









Pr inci pa I s 



K Elementary 



Secondary 



3. Unclass? f i ed 



g. Assistant principals 




* 
* 

G. Library/media special is ts 




J. Other teachers, e.g., radio/TV. etc 
K. Guidance £ counseling personnel 



■ £1 ementary 



2. Secondary 



3. Unclassified 



* 
* 



TOTAL {K 1 thru 39 



L. Other profess ional personnel 
H, Aides 





Figure 3. Partitions of Item 1, Part VI. 
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each of the four partitions noted above plus all of the rows not 
included in one of the four other partitions; these are marked 
on the figure by the asterisks to the left of each included row. 
Thus, there are five partitions of item 1 which may be subjected 
to the kind of edit ^treatment described above. 

Each partition should be checked in the following manner: 

• If the partition meets the row and column addition 
checks, proceed to the next partition; 

• If the addition checks indicate one cell in error, 
compute the size of the error (computed sum to reported 
sum) — if it is within + 5%, auto-correct the cell 
entry, note the change on the log, and proceed to the 
next partition — if it is more than 5%, auto-correct 
the cell, write a warning message, and proceed; 

• If the addition checks indicate two cells in the same 
row or column are in error, proceed as with one error; 

• If the addition checks do not uniquely isolate multiple 
errors examine the topmost row which does not add 
properly — if the total for this row can be adjusted 
by + 5% to bring the row into balance, do so and 
recheck the table — if it still does not check, correct 
any uniquely identifiable cell error (based on the 
second check) or correct the topmost row total that 
fails to check by up to + 5% and recheck — if the 
partition meets the check, proceed to the next parti- 
tion — if not, reject the record — in any event, 
write a warning message to the user, and set an edit 
system flag — in the event any other checks are 
generated during subsequent editing of item 1 on that 
record, the record should be rejected and held for 
manual checking, and a warning message should be written 
for the user. * 

The order in which these partitions should be checked is 
"inside-out": inside partitions A, B, C, D, followed by the 
outside partition marked by asterisks (Figure 3). Fields which 
are changed by this edit should have their indicator/estimator 
fields changed to "X" to record the event. 



Otherwise the program could spend a week making up spurious 
corrections for one really bad record. 
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This kind of edit consumes computer resources fairly 
heavily, and may not be feasible in many on-line data entry 
systems; it should be included as a component of the production 
edit system. Two other edit checks should also be applied to 
item 1, as part of both data entry and production edits: 

• If total membership (item 2Pa plus 2Pb) exceeds 10,000, 
items IFC, IGc, iHc, iJc, lK4c, iLc, lM3c, iNc, and 
10c should all have non-zero entries (if data is 
available) ; 

• Field ll6c (total teachers) should be greater than 
iPc (total personnel) minus ll6c (i.e., there should 
be more teachers than any other kind of personnel). 

Records that fail these edits at data entry should be rejected; 
those that fail during production edit should be rejectee? and 
written to a special hold file for later correction. If they 
are, indeed, correct (but unusual) , the production edit must 
be arranged to allow the user to force the record past these 
edits . 

While these edits all are implementable this year during 
the current cycle of Part VI, there is the potential, as there 
was for Part VI-A, for implementing relational/longitudinal 
checks next year. With the exception of item 1, such checks 
would significantly improve the power of error detection; in 
the case of item 1 the improvement is not as clear, given the 
powerful edit technique that will already be in place. It 
is recommended that the actual performance of the two alterna- 
tive approaches be compared empirically this summer (using what 
data are available from last year's tryout of Part VI for the 
relational checks) to determine whether both are justified when^ 
used simultaneously, or, if not, which should be preferred for 
future use. 
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APPENDIX A 



This appendix contains a copy of the survey form used by 
NCES for the 1978-7 9 cycle of CCD Part VI-A, the Universe of 
Public Schools survey. 
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PAflTMENT OF HEALTH, EDUCATION. AND WELFARP 
EOUCaTION OIVISION 
WASHINGTON. O.C. 2 Dill 

IRE OF DATA..PART VI A. UNIVERjSE OF PUBLIC SCHOOLS 



lOCAl EDUCATION ArttNCY (DtNT IF ICATION 



NCES 



SEA 



NAME OF tOCAL EDUCATION AGENCY 



nuke Ihc muUi of ttiU .mrvey compidirntivt. ircuialc, iml liracly, r « h 

'HONS Ihc idmh r>pcnlcd by cadi liKal cducallon jr.f«uy In Inc stale alonp wHI, ihc diia Inr rach 
f cnntiiiualkMi diccU arc needed, picaic make sure iliat ilic ncM higlici line nun.hci li inteilcd nn ' 
h dnllniiatiun shed. Line numbett ihbuM H in K-q«iencc how IKMll lo the higlicil nun.l)ci rKc.lc.1 fn, 
» yency. Ihe number of scliocdi Ihlcd on lliii paii shotild ci]ii3l the nunilKt of «.I»im.Ii cntcieJ f.u licin 
l-S l otm 2393-2. 



FORM «\p|iROVFO 
O.M.a ro. SIR-J?2» 



f'lHlet (0 be u^ci': 
SihtHil lype t 



6 ' voi-all«iual/(cihnlL-.<l 

7 = allcnulivc 



CiadcSfiiK Use 01,02, 03, 



LINE 
NO. 



OOOI 
9991 



0001 



0Q1& 

iA. 9 /J* (FMConifOl No. 70} 



NAME OF SCHOOL 



STREET ADOnESS 



ckn(cnla»y 4 coinMncd clcii,rnLii>7 
2 " iitliiJlc ttcJMidary 
J = »ctimd.uy 5 r sppdai duealidn fni liandlcajipcil 

12 fiH numbrrcil prailcv Use PK Hii prekiiukTpaUen, KC fur K(nderf;;iitni. If 

the sihoul is rtadt.l. cnlei the aiiiiriuilalc jindc dc^innallnni for the lowcsi nti.t hifhcti I'uilei uircnii If 
thcuJuH)! hunytaded. cotei UGU' in (he grade ^i-an cnhmu,, ^^^^ 



SCHOOL 
TYPE 



FTC OF I ocroncn t 

TEACIiEns' ! MEMnEnsillP 



llCPOni TO TMF. NFAIU SI rfNTH 
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APPENDIX B 



This appendix contains keying instructions and the tape 
record layout prepared by NCES for CCD Part VI -A, 197 8-7 9. 



ERIC 
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DEPARTMKNT OF HF.ALTH/ EDUCATION. .\ND WKl.KARK 
OFFICF- OF EDUCATION 

DATA PROCESSING 
SYSTEM DOCUMENTATION AND 
OPERATING PROCEDURES MANUAL 



SYSTEM TITLE: 


SYSTEM 10: 


2 12. XX KKYT\ •• r TT 


CCD 



Swht«etion(«) 



.01 



CHong* Nofic« ? 



212.01 KKYi::/S INSTRUCTIONS 



CHARGE 



SYSTEM 10: 



CCD 



J05 TITLE. 



SOURCE DOCUMENT Pt^^i rC 5V^^OOL 5 ^ ^ 

COMMENTS 



.Page, 
i>.\i r. 



.OF, 




t 



s 



11 



STADftg^^T- 



: ER1C . 



Mi 



ill 



11 



1^ 



11 



HI 



11 



5LO 



3o 



M 



1^ 



AM 



It 



KEY 



1I£ 



1 k/N 



lsft justif/j 5Ca te ^cuc to 



J.gAVC RLANfk: /cPfl/>p pfw) 



J?eV"M'. MOT A^ArLASLEV 
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ucrAKiMfc^iT c>K MI-ALTM. EDUCATION. AND WKI.I-AK.K 
OFFICK OF EDCCvTlON 

DATA PROCESSING 
SYSTEM DOCUMENTATION AND 
OPERATING PROCEDURES MANUAL 



SYSTEM TITL6: 


SYSTEM 10: 







Swbt«etio«^( t) 



.01 



CKoAgc Notic* * 



212.01 KKY I :? i:;STRUC7I0MS 



CHARGE ^. 



SYSTEM 10; 



JOB TITLE. 



SOURCE DOCUMENT p t/ fl t-C ^ ? C H Tl? ' ' ^ 



.PAC6_-i» or ^ 

iiAi 1-.: io{^nli^ 



COMMENTS 




HOTS 

t 



^-tcs <• wTi"«p2;"J^ 



ERIC 



s- -nne P.O. STATS AiQtevTvnfJI 
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APPENDIX C 



This appendix contains a copy of the survey form used by 
NCES for the 1978-79 cycle of CCD Part VI, the Local Education 
Agency Nonfiscal survey. 
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DEPARTMENT OF HEALTH, EDUCATION, £ WELFARE 
EDUCATION DIVISION 
NATIONAL CENTER FOR EDUCATION STATISTICS 



FORM APPROVED 
O.M.8. NO. SI R1227 



DUE DATE: January 15, 1979 



COMMON CORE OF DATA - PART VI. LOCAL EDUCATION AGENCY NONFISCAL REPORT 



This report is authorized by law (20 U.S.C. 1221e-l). While you are not required to 
respond, your cooperation is needed to make the results of this survey comprehensive, 
accurate, anc timely. 



1 0 numbers 


Name of agency 


NCES 


Street address 


SEA 


City., State, ZIP 


1. Full-time equivalent number of persons employed by this agency during the payroll 
period including October.!, 1978, 

REPORT TO THE NEAREST TENTH 





Hale 


Female 


Total 


Ass ignment/f unct ion 


(a) 


(b) 


(c) 


A. Superintendents 








B. Other officials/administrators 









C. Principals 




1 , Elementary 
2. . Secondary 



3. Unclassified 



tf. Assistant principals 




1 > Elementary 



2/ Secondary 
3> Unclassified 



Total of principals & asst, principals 
Currfculum special tsts 



a. 



Library/media soeciatists 



P sycho 1 og i ca 1 personne 1 



NCES FORM 2393-2, 9/78 (FM Control No. 76) 
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Iru*M-Cime equivalent number of persons employed by this agency during the payroll 
period including: October I, 1978. (continued)' 



Ass i qnmen t/ funct ion 


Male 

(a) 


Female 

(b) 


Total 
(c) 


K Classroom teachers 








1 . Preki ndergarten 








2. Kindergarten 








3.- Other elementary 








k. Secondary 








5. Unclassi f ied 








6. TOTAL (l 1 thru 5) 








J. Other teachers, e.g., radio/TV, etc. 








K. Guidance S counseling personnel 


* 'i'*>'i<sy' 








I . • Elementary 








2. Secondary 








3. Unclassi f led 









4. TOTAL (K 1 thru 3) 



L. Other professional personnel 








M. Aides 








1. Instructional aides 








2. Other abides 








3 . TOTAL (M 1 ana 21 








Office/clerical personnel 








0. Other nonprofessional personnel 








P. Total, all personnel 
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II • Number of pupils in memfaership on October 1, 1278, or nearest date ireto when a 
fall membership count Is taken. 



If as of date Is not October 1, please specify the date: 



Grade level 


ti 1 emen tary 

(a) 


Grade leve] 


Elementary i 

(a) 1 


w s^unua ry 

(b) 


\* Preklndergarten 




1 . Seventh 






K Kindergarten 




J. Eighth 






C. First 




K. Ninth 






D. Second 




L. Tenth 




E. Third 




M. Eleventh 




F. Fourth 




N. Twelfth 




Fifth 




0, Unclass i f led 






H. Sixth 




P . TOTALS (A thru 0) 







lll« Number of 12th grade graduates from the 
regular day schoat program (Inclajding 
summer session) during the 1977-78 
school year. 



Male 


Fema] e 


Total 









IV. 



or about October 



1978 



be 


Publ ic 


Prrvate 




on 


School 


School 


Total 











Large 


Smal 1 


Total 









V, Number of vehicles used to transport 
pupils owned wholly or jointly by the 
agency on or about October 1» 1978 
[Large • more than 15 fiassenger] 
[Small » less than 16 passenger] 

VI • Total area enclosed v-Ithin the agency's boundries in square miles . 

VII* Number of members of the board of education 

VIIK Number of schools operated by this agency on October 1, 1978 

IX. Number of scheduled days in the regular school term when 

pupils are expected to be In attendance 



SEE PAGE k FOR SPECIAL INSTRUCTIONS 
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SPECIAL INSTRUCTIONS TO ACCOMPANY NCES FORM 2393-2 



Report the full-time equivalent number of persons employeed to the 
nearest tenth. 

If personnel within selected assignment categories cannot be reported 
by level, report the appropriate totals only. 

If the number of principals and assistant pricipals cannot be re- 
ported separately, enter "N.A." on page 1, lines C and D, and enter 
the total number of principals and assistant principals on the same 
page, line E. 

If the number of aides- cannot be reported separately by type, re- 
port the total number of aides' on Line M,3, 

These forms should be returned to: NCES/DESES/ISB 

Federal Office Building No. 6 
400 Maryland Avenue, SW 
Washington, D.C. 20202 



EKLC 
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APPENDIX D 



This appendix contains keying instructions and the tape 
record layout prepared by NCES for CCD Part VI, 1978-79. 
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DATA PROCESSING 
SYSTEM DOCUMENTATION AND 
OPERATING PROCEDURES MANUAL 


S.ction4Po<.. 1 
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SYSTEM TITLE: ' 


SYSTEM 10: 
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Chong* Notice ^ 


2 1 2 .XX . KEYTiX r C- ' 





212.01 KEYTr.'i:-: INSTRUCTIONS 

CHARGE .V _ SYSTEM ID: 

J03 TITLE. 



_CCD 



-PAr.e_i OK f7 



SOURCE D0CUMENT _^8 '^' ^^^V^ ^ '"^ »^ ° ^ i f-ISC/j C . . ^ 

COMMENTS. 



>K^^-Tfl 



CA«0 j ReF. 
NUf.t 1 NO. 



FIELD TITLE 



— — : BLAHbc 



COLUMNS 



8 17 3 



- — I— N- Kgy 7 M'f-^F-^r^;. 



INSTRUCTIONS 



S7 



1^ 



(I 



KCy Two cPa/-p<:^ 
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1,^ ror^L. 



1- iL/jJ i KfgT.g' J ^ 



o - 

FRir 
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. ojxi.vicii 1 ur ni-.Ai-iM, hUUCATION, AND WEI. FAR K 

OFFICE OF EDUCATION 

DATA PROCSSSING 
SYSTEM DOCUMENTATION AND 
OPERATING PROCEDURES MANUAL 


Swb«ttction< i) . «^ ] 








SYSTEM TITLE: 


SYSTEM 10: 






2 12. XX KErr.Xi'v- ' . 


CCV 


Changa Notice 3 






212.01 KLTl^Cv INSTRUCTIONS 

CHARGE « SYSTEM ID: _CC£_ 

JOO TITLE__ 



SOURCE DOCUMENT !» f: pfix-^ ^ 
COMMENTS 



.PACE^_i2v OF_l-x_ 




orned Pen Aug- o?F;<iiAtV 



ISC 



ill 



-L 



i-<LiJLC3T 



1 



in. 



£4^ 



1^ 



- - 1 c 1 i3,g.s. r_ 



III 



ax 



T^7 



1 
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SYSTEM DOCUMENTATION AND 
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Sub««ction( t) • ^ 1 
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SYSTEM TITLE; — 


SYSTEM ID: 
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2 12. XX K£YT.•^^•r " — ' 


CCD 


CKong« Notice 3 



212.01 



CHARGE 

JOO TITLE 



INSTRUCTIONS 
. SYSTEM ID: 



SOURCE DOCUMENT f^g fftft f 
COMMENTS 



_PAGE_3 OF 1*7 




Mr 



i-i llA_g'5T' 



S£C£>»J^,f>fiy MACS' 



237 



4^ 



M 0 T"e ^ ^ X > JT__ 
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DATA PROCESSING 
SYSTEM DOCUMENTATION AND 
OPERATING PROCEDURES MANUAL 



SYSTEM TITLE: 



2 12. XX KEYTiViV^O 
212.01 KEYI^)>^ INSTRUCTIONS 

CHARGE n SYSTEM 10; _^D. 

JOB TITLE 



SYSTEM ID: 



Subi*cttonf i) 
Chang* Norico ^ ^ 



.01 



.pac 



SOURCE DOCUMENT ■(^(=-p.r)p -p' 
COMMENTS 
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1^ . 



HI 



•^-14 



4. 
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OFFICK OF EOl;CATION 

DATA PROCESSING 
SYSTEM DOCUMENTATION AND 
OPERATING PROCEDURES MANUAL 
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SYSTEM 10: 
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Section & Po(^« 

Subi«cfton(f ) 
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SOURCE DOCUMENT g 
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